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INTRODUCTION 


Since  the  work  of  Lombard  (1911) ,  we  have  known  that  when 
speakers  talk  in  the  presence  of  noise,  characteristics  of  their 
speech  change.  Recently,  there  has  been  considerable  interest  in 
describing  the  details  of  these  acoustic-phonetic  changes. 
Summers,  et  al.,  (1988)  reported  that  amplitude,  fundamental 
frequency,  and  segment  durations  increased  in  the  presence  of 
noise.  In  addition,  they  found  differences  in  formant 
frequencies  and  the  short-term  spectra  of  vowels.  Such  changes 
were  also  described  by  Bond,  Moore,  and  Gable  (1989),  though  we 
reported  some  subject  variability  in  the  effects  of  noise  on 
segment  durations. 

The  purpose  of  this  study  was  to  extend  our  understanding  of 
the  effects  of  noise  on  speech  by  examining  sentences  rather  than 
isolated  words  produced  while  speaking  in  the  presence  of  a 
relative  high  level  of  noise.  It  is  known  that  the  global 
effects  of  Increases  in  fundamental  frequency  and  amplitude  found 
in  isolated  words  are  also  found  in  continuous  speech  produced  in 
noise  environments  (see  Lane  and  Tranel,  1971),  What  is  not 
known  is  whether  the  segmental  and  spectral  effects  observed  in 
isolated  words  are  also  present  in  connected  or  continuous 
speech. 
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METHOD 


Speakers 

The  speakers  were  four  young  males,  college  stuuents  at  a 
Midwestern  university.  None  of  the  speakers  had  any  history  of 
speech  or  hearing  difficulties.  All  were  audiometrically 
screened  to  ensure  that  they  had  Hearing  Threshold  Levels  of  less 
than  15  dB.  They  also  served  as  listeners  on  a  panel 
investigating  speech  intelligibility  for  the  Air  Force  and 
consequently  were  experienced  speaking  in  noise  environments. 
These  same  four  speakers  served  in  an  earlier  study  (Bond,  et 
al,,  1989),  The  subjects  were  paid  for  their  participation. 

Procedure 

The  speakers  were  recorded  in  a  baseline  condition  with  no 
noise  exposure  and  while  listening  to  pink  noise  over  hoadphones 
at  95  dB  SPL,  Both  recordings  were  made  using  a  military  boom 
microphone  (M-167)  while  the  subjects  were  seated  in  an  anechoic 
chamber.  Side  tone  was  adjusted  by  the  speakers  to  what  they 
considered  a  comfortable  level  in  the  baseline  condition  and  was 
not  changed  when  the  speakers  were  exposed  to  noise. 

The  speakers  recorded  20  short  sentences,  taken  from  the  CID 
sentence  lists  (lists  E  &  J,  Davis  and  Silverman,  1978)  ,  2  times 
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in  each  speaking  condition,  for  a  total  of  80  sentences  per 
subject.  The  speakers  read  the  sentences  in  a  relaxed, 
relatively  casual  speaking  style. 


Data  Analysis 

Speech  analysis  was  performed  using  SPIRE  (Speech  and 
Phonetics  Interactive  Research  Environment) ,  on  the  Symbolics 
3670  computer.  Each  production  of  each  sentence  was  digitized  at 
16  kHz  with  16  bit  resolution.  Each  segment  in  each  sentence  was 
labeled  using  the  transcription  facility  of  SPIRE  (Cyphers,  et 
al.,  1986).  Segment  boundaries  were  located  from  wide-band 
spectrogram  and  waveform  displays,  following  the  criteria 
outlined  in  Peterson  and  Lehiste,  1960.  Word  boundaries  were 
also  marked.  The  data  set  consisted  of  approximately  850 
labelled  segments  for  each  speaker  in  each  Bpeaking  condition. 

The  SPIRE  parameters  of  formant  frequencies,  fundamental 
frequency,  frication  frequency,  total  energy,  and  energy  in  low 
and  high  frequency  bands  were  computed  for  all  segments  in  each 
speaking  condition  for  each  subject.  These  samples  were 
submitted  to  the  program  SEARCH  (also  developed  by  the  Speech 
Processing  Group  at  MIT)  so  that  speech  parameters  of  interest 
could  be  compared  in  both  speaking  conditions  for  any  segment  or 
group  of  segments. 
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SEARCH  allows  data  sets  describing  utterances  to  be 


partitioned  into  user-specified  subsets,  for  example  all  stops, 
or  all  voiceless  fricatives.  SEARCH  also  calculates  simple 
descriptive  statistics  of  SPIRE  parameters  for  phoneme  subsets, 
e.g.,  means  and  standard  deviations  of  the  duration  of  all 
fricatives  or  the  frequency  of  the  first  formant  for  all  vowels. 
(See  Cyphers,  et  al,,  1986,  for  further  details). 

RESULTS 

Fundamental  Frequency 

As  in  almost  all  previous  investigations,  the  read 
sentences  were  found  to  be  higher  in  pitch  when  the  speakers  were 
exposed  to  noise  than  when  they  were  speaking  in  the  benign 
condition.  The  fundamental  frequency,  taken  at  the  mid-point  of 
all  vowels  in  the  sample,  increased  for  each  of  the  four  speakers 
in  noise.  The  distributions  of  the  fundamental  frequencies  are 
given  in  Fig.  1  for  each  speaker.  The  smallest  average 
fundamental  frequency  (Fo)  increase  was  13  Hz  for  S4,  the 
greatest  was  48  Hz  for  S2,  Averaged  for  all  four  speakers,  Fo 
increased  25  Hz,  approximately  a  26  percent  increase.  There  was 
also  a  tendency  for  the  variability  of  Fo  to  increase  for  speech 
produced  in  the  presence  of  noise. 

Total  Energy 
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Total  energy  also  increased  for  all  four  speakers  in  the 
presence  of  noise.  Total  energy  values  per  speaker,  averaged  for 
all  vowels  in  the  sample,  are  given  in  Fig,  2.  Total  energy  is 
measured  using  SPIRE  in  terms  of  dB  down  from  a  reference.  The 
largest  total  energy  increase,  11  dB,  was  found  for  S2,  the 
speaker  who  also  exhibited  the  greatest  increase  in  fundamental 
frequency.  Averaged  for  four  speakers,  the  total  energy  increase 
was  7  dB.  In  general,  increases  in  total  energy  and  fundamental 
frequency  were  correlated.  Increases  in  total  energy  were 
associated  not  only  with  vowels  but  with  all  other  segments  for 
which  energy  could  be  measured. 

Spectral  Tilt 

The  spectrum  of  speech  produced  in  noise  has  also  been  found 
to  be  characterized  by  a  relative  increase  in  energy  in  high 
frequencies  in  comparison  with  lower  frequencies,  that  is,  by  a 
change  in  spectral  tilt.  In  order  to  evaluate  the  read  sentences 
for  this  possibility,  the  energy  in  a  low-frequency  band  (300-600 
Hz)  and  a  high-frequency  band  (2000-3000  Hz)  was  calculated  for 
all  voxels.  Since  total  energy  increased  with  noise,  energy 
would  be  expected  to  increase  in  both  energy  bands  as  well.  The 
increase  in  the  low-frequency  band  averaged  6.9  dB  for  all  four 
speakers  while  the  energy  in  the  high-frequency  band  increased 
almost  10  dB.  For  all  four  speakers,  there  was  a  tendency  for 
more  energy  to  be  present  at  higher  frequencies  for  speech 
produced  in  noise. 


5 


Durations 


The  overall  noise  effects  on  word  and  segment  durations  in 
read  sentences  were  variable  for  the  four  subjects.  For  two 
subjects,  the  average  durations  of  all  words  decreased  in  noise, 
by  41  ms  for  SI  and  14  ms  for  S3.  For  the  other  two  speakers, 
average  word  durations  increased  by  18  ms  for  S4  and  by  5  ms  for 
S2 . 

For  three  speakers  (S2,  S3,  S4)  the  average  durations  of  all 
vowels  increased  by  a  very  small  amount,  from  3  to  15  ms.  For 
SI,  average  vowel  durations  decreased  by  15  ms.  The  tendencies 
found  for  all  vowels  were  also  present  for  vowel  subsets  such  as 
inherently  long  and  short  vowels  and  diphthongs.  In  general,  the 
longer  the  vowel,  the  more  it  tended  to  increase  in  duration. 

The  magnitude  of  the  effect  of  noise  on  vowel  durations,  however, 
was  clearly  small  and  statistically  non-significant.  The 
distributions  of  vowel  durations  for  all  four  subjects  are  given 
in  Fig.  3. 

Frication  Frequency 

In  SPIRE,  frication  frequency  is  defined  as  the  most 
prominent  frequency  in  noisy  portions  of  the  speech  signal. 
Averaged  across  all  fricatives,  frication  frequency  increases  for 
all  subjects  by  370  Hz,  or  approximately  18  percent.  The  values 
for  each  speaker  are  shown  in  Table  1. 
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Vowel  Formants 


Values  for  the  first  and  third  formants  averaged  across  all 
vowels  for  each  speaker  are  given  in  Table  2.  The  most 
consistently  reported  effect  of  noise  on  the  formant  structure  of 
vowels  has  been  an  increase  in  the  frequency  of  the  first 
formant.  This  effect  was  present  and  can  be  seen  both  for 
individual  vowels  and  globally.  Averaged  for  all  vowels  in  the 
sample,  the  first  formant  increased  from  a  maximum  of  71  Hz  ( S 2 ) 
to  a  minimum  of  10  Hz  (S3).  When  averaged  for  all  four  subjects, 
the  first  formant  increased  34  Hz. 

The  second  most  consistent  vowel  formant  shift  affected  the 
third  formant.  On  the  average,  the  third  formant  was  lower  in 
speech  produced  in  noise  for  all  four  subjects,  Averaged  for  all 
vowels,  the  third  formant  decreased  by  140  Hz  for  SI  to  50  Hz  for 
S3.  The  average  for  all  four  subjects  was  a  decrease  of  88  Ha, 

Second  formant  values  averaged  across  all  vowels  are  not 
reported  because  previous  work  suggests  that  the  effects  of  noise 
on  the  second  formant  may  vary  from  vowel  to  vowel.  (Bond,  et 
al.,  1989). 

1 

Figure  4  shows  the  average  center  frequencies  of  FI  and  F2 
for  the  four  vowels  /i,  ae,  a,  u/,  which  represent  the  corners  of 
I  the  traditional  vowel  quadrilateral,  produced  under  both  ambient 
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and  noise  conditions.  As  has  been  reported  for  isolated  words, 
the  major  effect  of  speaking  in  the  presence  of  95  dB  pink  noise 
is  an  upward  shift  in  frequency  of  PI.  As  we  also  observed  in 
the  case  of  isolated  words,  F2  for  / i /  shows  a  slight  decrease  in 
frequency  while  it  remains  essentially  unchanged  for  /ae/  and 
/a/.  The  major  difference  between  the  results  noted  in  the  vowel 
F1-F2  plots  for  sentences  and  those  reported  for  isolated  words 
occurred  with  /u/.  In  the  isolated  word  condition  words  spoken 
by  the  same  four  talkers  resulted  in  an  upward  shift  of  F2  for 
/u/  when  spoken  in  the  presence  of  noise i  in  the  sentences  F2  for 
/u/  decreased  slightly  when  spoken  in  noise  relative  to  the 
ambient  condition.  The  major  difference,  however,  was  a 
significant  increase  in  F2  for  /u/  when  embedded  in  a  sentence  as 
opposed  to  when  in  an  isolated  word.  When  in  isolated  words  the 
average  F2  value  for  /u/  produced  by  the  four  talkers  in  ambient 
conditions  was  about  1000  Hz,  When  the  same  four  talkers  under 
the  same  conditions  read  sentences,  the  average  F2  value  for  /u/ 
was  around  1650  Hz.  Pokes  and  Bond  (1986)  have  noted  that  there 
is  a  tendency  for  American  talkers  to  produce  /u/  with  a  higher 
second  formant  in  sentence  context  then  when  the  same  vowel 
appears  in  isolated  words.  However,  the  difference  they  noted 
was  not  as  pronounced  as  that  found  here. 


DISCUSSION 

The  changes  of  speech  with  noise  observed  in  sentences  are 
consistent  with  our  previous  findings  dealing  with  isolated  words 
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and  also  with  the  general  tendencies  reported  in  the  literature. 
First,  duration  changes  for  words  and  segments  are  small  and 
inconsistently  present.  They  do  not  appear  to  be  systematic 
enough  to  attribute  to  the  noise  environment,  though  possibly  SI 
is  an  exception. 

Second,  increases  in  pitch  frequency  and  total  energy  as 
well  as  in  frication  frequency  are  present  for  all  speakers. 

These  changes  probably  result  from  increased  vocal  effort.  When 
in  the  noisy  environment,  the  speakers  try  to  increase  the 
loudness  of  their  speech  to  a  level  they  feel  appropriate.  The 
changes  in  spectral  tilt  would  be  an  expected  consequence  of 
increased  vocal  effort  as  well. 

Third,  the  formant  changes  are  also  generally  consistent 
with  previous  work.  The  increase  of  FI  may  be  a  consequence  of 
restricted  tongue  movement  caused  by  the  more  open  mouth  position 
associated  with  loud  speech.  However,  an  explanation  for  the 
systematic  decrease  in  F3  is  not  entirely  clear.  A  low  F3  is 
associated  with  a  mid-palatal  constriction  at  least  in  the 
production  of  rhotacized  vowels  (Pickett,  1980) ,  Whether  a 
palatal  constriction  is  responsible  for  the  observed  F3  decreases 
or  whether  they  result  from  some  other  speech  production 
mechanism,  perhaps  pharyngeal  stiffening,  is  not  clear  on  the 
basis  of  this  research.  That  pharyngeal  stiffening  may  be 
responsible  for  the  F3  shift  is  suggested  by  a  finding  of  Butcher 
and  Ahmad  (1987),  who  report  a  lowering  of  F3  by  approximately 
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200  Hz  in  the  environment  of  the  pharyngeal  consonants  of  Iraqi 
Arabic . 

Finally,  it  has  been  noted  (Bond,  et  al.,  1989;  Moore  and 
Bond,  1987;  Summers,  et  al.,  1988)  that  many  of  the  changes 
observed  in  speech  produced  in  noise  may  reflect  articulatory 
changes  made  to  increase  vocal  effort  and  to  more  precisely 
articulate  in  order  to  enhance  communication  in  an  interfering 
environment.  Indeed  it  has  been  shown  that  for  equivalent 
signal-to-noise  ratios,  speech  produced  in  noise  is  more 
intelligible  than  speech  produced  in  quiet  (Dreher  and  O'Neill, 
1957;  Summers,  et  al.,  1988).  In  addition,  we  have  conducted 
listening  tests  using  the  isolated  words  spoken  by  these  same 
four  talkers  (Bond  and  Moore,  1989)  and  found  that  the  words 
produced  in  noise  were  more  intelligible  at  equivalent 
signal-to-noise  levels  for  both  native  and  non-native  speakers  of 
English,  with  the  non-native  speakers  of  English  showing  the 
greater  increase  in  intelligibility. 
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TABLE  1.  FRICATION  FREQUENCY  (Hz) 


SUBJECT 

AMB . 

NOISE 

CHANGE 

1 

1820 

2130 

310 

2 

1930 

2250 

320 

3 

2070 

2460 

390 

4 

2310 

2770 

460 

Average 

2032.5 

2402.5 

370 
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TABLE  2 


P1 

(Hz) 

F3  (Hz) 

SUBJECT 

AMB. 

NOISE 

CHANGE 

AMB. 

NOISE 

CHANGE 

1 

447 

473 

26 

2330 

2190 

-140 

2 

448 

519 

71 

2390 

2290 

-100 

3 

433 

443 

10 

2380 

2330 

-  50 

4 

506 

533 

27 

2540 

2480 

-  60 

Average 

458. 

5  492 

33.5 

2410 

2322.5 

-87,5 
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Fig.  1.  Distribution  of  pitch 

frequency  in  both  speaking 
conditions  for  four  speakers. 
The  abscissa  Is  In  Ha. 
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Fig.  3 


.  Distribution  of  vowel 
durations  In  both  speaking 
conditions  for  four  speakers. 
The  abscissa  Is  In  seconds. 


F,  —  F,  VOWEL  SPACE 


Fig.  4.  The  space  defined  by  F1-F2 
for  the  front  vowels  /I  ae/ 
and  for  the  back  vowels  /u,a/, 
In  two  conditions. 
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